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Method for determining the position of text lines in text 
recognition tasks 

The invention relates to a method for determining the position 
of text lines in text recognition tasks, whereby the 
5 brightness distribution of an acquired image excerpt along the 
vertical is determined by histogram formation along the lines, 
and this brightness distribution is smoothed, whereby maximum 
value and minimum value of the function obtained in this way 
are determined, and thresholds that serve as the basis for 
10 distinguishing between text line and text interspace are 
calculated on the basis of these extremes. 

In the case of the automatic recognition of texts, that is to 
say in the case of the conversion of the graphical information 
of a document into text characters which can be further 
15 processed by means of electronic text processing programs, an 
essential prerequisite for a successful recognition operation 
is that the position and the size of the individual characters 
be determined accurately. This presupposes in turn that the 
position and the dimensions of the text lines be known. 

20 In the case of manually guided readers, moreover, the profile 
of the text lines in the captured image excerpt turns out to 
be non-linear. In this context, there is a need to determine 
the profile of a text line. 

A method of the species initially cited is disclosed by EP 
25 0702 329 A2 . This publication discloses a method and an 

apparatus for determining the line course given handwritten 
documents. According to this publication, the picture 
elements are summed up line-by-line, smoothed and analyzed for 
the determination of the position of the lines. 



3 0 The invention is based on the object of improving this method- 
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This is dona according to the invention by means of a method 
of the type mentioned in the introduction wherein a line 
interspace is identified when the function comprises a. 
combination of a maximum with a minimum, whereby the minimum 
5 comprises a value less than function minimum + plurality of 
picture elements over the width of the image excerpt/15 + 
2*plurality of the picture elements over the width of the 
image excerpt / 15 * function maximum / plurality of picture 
elements over the width of the image excerpt and the drop off 
10 of the function values after the maximum comprises a value 
greater than (function maximum - function minimum) /2 . This 
embodiment has proven itself in practice on the basis of very 
good results. 

Advantage is afforded by a refinement of the method wherein in 
15 order to ascertain the left-hand edge of a line, the 

brightness distribution of a captured image excerpt along the 
horizontal is determined and the function obtained in this way 
represents the beginning of a line by an abrupt rise in the 
function value. The beginning of a line can thus be determined 
20 in a simple manner with little complexity. Furthermore, for 

the determination of the position of the text lines, it can be 
ensured that in this case only images which actually contain 
text lines are taken into consideration and a user error, such 
as e.g. positioning the reading pen too far to the left of the 
25 beginning of a line, does not influence the determination of 
the line. 



It is expedient if after the position of a line has initially 
been ascertained, the further course of the said line is 
determined by evaluating the information concerning the text 
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characters recognized. Evaluating the results of the 
character classification enables the line profile to be^ 
determined particularly accurately. 

5 The invention is explained in more detail with reference to 
figures, in which, by way of example: 

Figure 1 shows a text excerpt of the kind that is typically 
captured by a manually guided reader, and also the histogram 
determined therefrom and 
10 Figure 2 shows the filtered histogram with the parameters 
entered for the assessment of the image. 

The sequence of the method according to the invention is as 
follows : 

15 A line histogram is determined for the captured image 

excerpt. In this case, for each line, the values of all the 
pixels of this line (0 for white and 1 for black) are summed. 
The result is a function f(y) with 

Width-\ 

f(y)= ^(BlackPixel) wher e 



20 



i=0 



y denotes the line index of the image 

Width indicates ' the width (number of columns) of the 

image excerpt 

25 When a text is present, this function has a typical profile 
as illustrated by way of example in Figure 1. In a further 
step, filtering is carried out in accordance with 
+5 



+ •<?(«)) 

i=-5 



30 



where 

y index in the line histogram 
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4 



G 



weighting corresponding to an exponential smoothing 



curve 



index of the smoothing curve 



5 During the filtering operation, the absolute maximum Valuemax 
i.e. the number of black pixels of the darkest line and the 
absolute minimum Valuemin i.e. the number of black pixels in 
the brightest line are also determined. 

10 Parameters for the assessment of the image are derived from 
these two values. The said parameters are: 

Trough limit = {Valuemax - Valuemin) /2 

but at least number of pixels over the width of the image 
15 excerpt/30 

Minima edge = Valuemin + number of pixels over the width of 
the image excerpt/15 

but at most 2*nujnjber of pixels over the width of the image 
20 excerpt/15 

Minima threshold = minimum edge + (2* number of pixels over 
the width of the image excerpt/15 * (Valuemax/ number of 
pixels over the width of the image excerpt) ) 
25 but at most 3* number of pixels over the width of the image 
excerpt/ 15 

Using the function f ' (y) and the threshold values determined, 
as are illustrated by way of example in Figure 2, the 
30 captured image is then assessed with regard to the presence 
of text lines and line interspaces. 

For this purpose, the curve profile is examined to see 
whether values which are smaller than the parameter minima 
35 threshold are present. If this is the case, then the relevant 
area is qualified as a valid minimum and thus as a possible 
line interspace. 
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An actual line interspace is present, however, only when the 
presence of a text line is indicated by an adjoining maximum 
with a certain characteristic value. These valid maxima are 
defined by subsequent decreasing of the curve value by a 
5 magnitude > Trough limit. 

The coincidence of a valid maximum with a valid minimum 
characterizes the transition from a text line to a line 
interspace. The parameter Minima edge serves for accurately 
10 determining this transition. 

The point at which the curve intersects this threshold 
between a valid maximum and a valid minimum is defined as a 
line edge. 

15 

In order to determine the left-hand edge of a line, a column 
histogram is created in accordance with 

H eight -} 

f(x)= ^(BlackPixel) 

i=0 

20 x column index of the image excerpt 

Height . . . .image height 

in words the colour information of the pixels of each column 
of the captured image excerpt is summed. The left-hand text 
25 edge is defined (given the presence of at least one line) by 
an abrupt rise in the function value f(x). 

The follow-up plotting of the lines, that is to say the 
information concerning the further profile of the lines, 
30 which is important particularly in the case of manually 

guided readers on account of the fluctuations that occur with 
the latter, is effected on the basis of the position of the 
recognized characters . 

35 For this purpose, the recognized characters are classified 
into the following size groups: 
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Small characters (for example "a") 
Large characters (for example "A","g") 



0.7*line height 
line height 



Oversize characters (for example w [ v \"j") line height 
+0.3*line height (descenders) 
5 Special characters: the characters cannot be unambiguously 
assigned by size. 

The following character groups are differentiated for the 
determination of the new lower edge of the text line: 

10 

Baseline characters (for example "A", "."): the lower edge of 
the character corresponds to the lower edge of the text line, 
irrespective of the size of the character; 

Descender characters (for example "g", "["): the lower edge 
15 of the character corresponds to the descender boundary, 
irrespective of the size of the character; 

Special characters: these characters cannot be unambiguously 
assigned with regard to their lower edge. 

20 On the basis of these assignments and a probability value G 
relating to the correct classification of the character, the 
said probability value being obtained in the course of the 
classification method, the new line height Height is then 
determined as follows: 



25 



G = Probability * CYC_MAX_WEIGHT 



CYCMAXEXTRPAR -1 



J OldHeight[i] + NewHeight * G 



Height = 



i=0 



CYC MAX EXTRPAR + G 



30 



G 



weighing of the line height derived 



from the current character 



Probability probability of correct character 

classification (range of values between 0 and 1) 
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CYC MAX_WEIGHT maximum weighing of the new 

character position (for example: 5) 

Height subsequently plotted line height 

(upper case letter height) 
5 CYC_MAX_EXTRPAR size of the ring buffer for the 

averaging (for example: 3) 
OldHeightU ring buffer 

NewHeight line height derived from the current 

character (upper case letter height) 
10 i index in the ring buffer 

The profile of the lower edge of the text line is determined 
in accordance with: 



15 G = 



20 



Probability + 



1 ^ 



Increase = 



Base — NewBase + - 



CYC_MAX_WEIGHT ) 

Oldlncreas e + Newlncrease * G 
1 + G 

Increase * DeltaX + 50 



*CYC MAX WEIGHT 



100 



G weighing of the new character position 

Probability probability of correct character 

classification 

CYC_MAX_WEIGHT maximum weighting of the new character 

25 position (for example: 5) 

Increase subsequently plotted current gradient of 

the baseline in % 

Oldlncrease previous gradient of the baseline in % 

- Newlncrease gradient of the base line in % calculated 

30 from the position of the current character 

Base subsequently plotted baseline position 

(rounded to an integer value) 

NewBase baseline position calculated from the 

position of the current character 
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DeltaX X-separation in the image between the two 

centre points of the characters extracted last. 

The "Increase" is limited by the plausibility limit 
5 CYC MAX LINEOFFSET (in the Pocket Reader: 15%) . 



m 

Lr 1 

ry 



